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Abstract 

rj"! ■ This paper develops a methodology for approximating the posterior first two moments 

of the posterior distribution in Bayesian inference. Partially specified probability mod- 
els, which are defined only by specifying means and variances, are constructed based 
upon second-order conditional independence, in order to facilitate posterior updating and 
prediction of required distributional quantities. Such models are formulated particularly 
for multivariate regression and time series analysis with unknown observational variance- 
covariance components. The similarities and differences of these models with the Bayes 
linear approach are established. Several subclasses of important models, including regres- 
sion and time series models with errors following multivariate t, inverted multivariate t 
and Wishart distributions, are discussed in detail. Two numerical examples consisting of 
£S| . simulated data and of US investment and change in inventory data illustrate the proposed 

C methodology. 
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> ■ 

;rj ■ 1 Introduction 

Regression and time series problems are important problems of statistical inference, which 
appear widely in many science fields, as for example in econometrics and in medicine. Re- 
gression has been discussed in many textbooks (Mardia et al, 1979, Chapter 6; Srivastava 
and Sen, 1990); from a Bayesian standpoint Tiao and Zellner (1964), Box and Tiao (1973), 
Mouchart and Simar (1984), Pilz (1986), Leonard and Hsu (1999, Chapter 5) and O'Hagan 
and Forster (2004, Chapter 9) discuss a variety of parametric regression models, where the 
residuals follow normal or Student t distributions. Recent work on non-normal responses 
includes regression models in the type of generalized linear models (GLMs) (McCullagh and 
Nelder, 1989) and time series models in the type of dynamic GLMs (Fahrmeir and Kaufmann, 
1987, 1991; Fahrmeir, 1992; West and Harrison, 1997, Chapter 12; Fahrmeir and Tutz, 2001, 
Chapter 8; Kedem and Fokianos, 2002; Godolphin and Triantafyllopoulos, 2006). Hartigan 
(1969) and Goldstein (1976) develop Bayesian inference for a general class of linear regres- 
sion problems, in which the parameters or states of the regression equation are estimated 
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by minimizing the posterior expected risk. Goldstein (1979, 1983), Wilkinson and Goldstein 
(1996) and Wilkinson (1997) propose modifications to the Bayes linear estimators to allow 
for variance estimation in regression and time series problems. Such considerations are useful 
in practice because they allow inference to a range of problems that otherwise the modeller 
would need to resort to Monte Carlo estimation (Gamerman, 1997) or to other simulation 
based methods (Kitagawa and Gersch, 1996). West and Harrison (1997, Chapter 4) and 
Wilkinson (1997) discuss how the above mentioned regression estimation can be applied to a 
sequential estimation problem, which is necessary to consider in time series analysis. 

In this paper we propose a modelling framework that allows approximate calculation of 
the first two moments of the posterior distribution in Bayesian inference. This is motivated 
by situations when a model may be partially specified in terms of its first two moments, or its 
probability distribution may be difficult to specify (or it may be specified with uncertainty). 
Partially specified prior posterior (PSPP) models are developed for dynamic situation in 
which a modeller is reluctant to specify a full probability model and yet requires a facility 
for approximate prior /posterior updating on mean and variance/covariance components of 
that model. The basic idea is that a linear function <p(X, Y) of two random vectors, X, Y, 
is second-order independent of the observed value of Y. Then in learning, no matter what 
value of Y is observed, the mean and the variance of (f>(X, Y) takes exactly the same value. A 
further requirement is that the mean and variance of X\Y = y can be deduced by the mean 
and variance of (j>(X, Y). We show that for a class of regression models, linear Bayes methods 
are equivalent to PSPP, while we describe situations where PSPP can provide more effective 
estimation procedures than linear Bayes. We then describe two wide classes of regression and 
time series models, the scaled observational precision (SOP) and the generalized SOP, both of 
which are aimed at multivariate application. For the former model, we give the correspondence 
of PSPP (based on specification of prior means and variances only) with the normal/gamma 
model (based on specification of the prior distribution as normal/gamma) . For the latter 
model, we show that PSPP can produce efficient estimation, overcoming problems of existing 
time series models. This relates to covariance estimation for multivariate state space models 
when the observation covariance matrix is unknown. For this interesting model we present 
two numerical illustrations, consisting of simulated bivariate data and of US investment and 
change in inventory data. 

The paper is organized as follows. PSPP models are defined in Section [2j Sections [3] and 
H] apply PSPP modelling to regression and time series problems. The numerical illustrations 
are given in Section [5j Section [6] gives concluding comments and the appendix details the 
proof of a theorem of Section [2j 

2 Partially specified probability modelling 
2.1 Full probability modelling 

In Bayesian analysis, a full probability model for a random vector Z comprises the joint dis- 
tribution of all its elements. The forecast distribution of any function of Z is then just that 
function's marginal distribution. Learning or updating simply derives the conditional distri- 
bution of Z given the received information on the appropriate function of Z. For example, 
let Z = [X' Y']' , where X,Y are real valued random vectors, and the probability density 
function of Z be denoted by p(.). X will often be the vector comprising the parameters or 
states of the model and Y will be the vector comprising the observations of interest. The 
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model is precisely defined, if a density of Y given X is specified, e.g. p(Y\X) so that p{y\X) 
is the likelihood function of X based on the single observation Y = y. Then the one-step 
forecast distribution of Y is the marginal distribution of Y 

p(Y) = Jp(X,Y)dX, (1) 

where S is the space of X, also known as parametric space. When the value y of Y is observed, 
the revised density of X is 

p{y) 

from direct application of the Bayes theorem. 

Most Bayesian parametric regression and time series models (including linear and non- 
linear) adopt the above model structure and their inference involves the evaluation of integral 
(HD and the Bayes rule (J5J. 

However, in many situations, the evaluation of the above integral is not obtained in closed 
form and the application of rule ([2]) does not lead to a conjugate analysis, which is usually 
desirable in a sequential setting such as for time series application. For such situations, it is 
desirable to approximate only the mean and variance of X\Y = y. In this paper we consider 
the general problem of obtaining approximations of the first two moments of X\Y = y, when 
we only specify the first two moments of X and Y alone and not their joint distribution. We 
achieve this by replacing the full conditional independence structure, which is based on the 
joint distribution of X and Y, by second order independence, which is based on means and 
variances of X and Y. Our motivation is generated from the Gaussian case; suppose that X 
and Y have a joint normal distribution, then X — A xy Y and Y are mutually independent and 
the distribution of X\Y = y can be derived from the distribution of X — A xy Y, where A xy is 
the regression matrix of X on Y (for a definition of A xy see Section 12. 2p . So we can define 
a subclass of the Bayesian models of (pQ) and ([2]), where we can replace the strict mutual 
independence requirement by second order independence. Details appear in our definition of 
prior posterior probability models that follow. 



2.2 Posterior mean and variance approximation 

Let X £ M. m , Y € MP, W G M. q be any random vectors with a joint distribution (m,p,q G 
N — {0}). We use the notation E(X) for the mean vector of X, Var(X) for the covariance 
matrix of X and Cov(X, Y) for the covariance matrix of X and Y. We use the notation 
XJ-2Y to indicate that X and Y are second order independent, i.e. E(X|y = y) = M(X) 
and Var(X|y = y) = Var(X), for any value y of Y. Furthermore, we use the notation 
Xi^WlY to indicate that, given Y, X and W are second order independent, i.e. E(X|1/F = 
Wl Y = y) = E(X\Y = y) and \&i{X\W = w,Y = y) = Var(X|Y = y). Details on 
conditional independence can be found in Whittaker (1990) or Lauritzen (1996), who discuss 
independence in a much more sophisticated level necessary for the development of graphical 
models. 

Considering vectors X and Y as above, it is well known that X — A xy Y and Y are 
uncorrelated, where A xy = Cov(X, Y){Var(Y)} -1 is the regression matrix of X on Y. In 
order to obtain approximations of the posterior mean E(X|Y = y) and the posterior covariance 
matrix Var(X|Y = y) it is necessary to go one step further and assume that 

X - A xy Y± 2 Y, (3) 



3 



which of course implies that X — A xy Y and Y are uncorrelated. With \i x = E(X) and 
fly = E(Y), the prior means of X and Y, respectively, the above assumption is equivalent to 
the following two postulates. 

1. Given Y, the posterior mean E(X — A xy Y\Y = y) of X — A xy Y does not depend on the 
value of y of Y, so that the value of this mean must be the same for all values of Y, and 
so be equal to its prior expectation fj, x — A xy fi y . 

2. Given Y, the posterior covariance matrix Var(X — A xy Y\Y = y) of X — A xy Y does 
not depend on the value y of Y, so that this posterior covariance matrix takes the 
same value for all values y of Y and is necessarily equal to its prior covariance matrix 
Var(X - A xy Y). 

Thus it is possible to approximate E(X|Y = y) and Var(X|Y = y), since from the definition 
of second order independence (given above), we have 

E(X - A xy Y\Y = y)= E(X - A xy Y) => E(X\Y = y) - A xy y = fi x - A xyl x y 
E(X\Y = y) = (i x - A xy (y - fi y ), 
Var(X|Y = y) = Var(X - A xy Y\Y = y) = Varpf - A xy Y) 
= T, x + A xy T, y A xy — 2Cov(X, Y)A' xy = — A xy Y, y A' xy 

and so we write 

X\Y = y ~ {Hx + A xy {y - fJ, y ),T, x - A xy Y> y A' xy }, 

where S x = Var(X) and T, y = Var(Y). 

Therefore we can define models that have a prior/posterior updating facility that is based 
on second order independence and that can approximate the posterior mean and variance 
obtained from an application of the Bayes theorem when the full distributions are specified. 
Thus we have the following definition. 

Definition 1. Let X and Y be any vectors of dimensions m and p respectively and assume 
that it exists the joint distribution of Z = [X' Y']' . Let A xy be the regression matrix of X 
on Y . A first order partially specified prior posterior probability model for (X; Y) (notation: 
PSPP(l)), is defined such that: (a) X — A xy Y -L2Y and (b) for any value y ofY, the mean 
vector and the covariance matrix of X\Y = y are obtainable from the mean vector and the 
covariance matrix of X — A xy Y . 

We note that if X and Y have a joint normal distribution, then second order independence 
is guaranteed and in particular X — A xy Y and Y are mutually independent, which is much 
stronger than property (J3j) - In this case E(X|Y = y) and Var(X| Y = y) are the exact posterior 
moments, produced by an application of Bayes rule ([2]). It follows that the approximation of 
the first two moments reflects on the approximation of postulate ([3|). Thus the approximations 
of E(X|Y = y) and Var(X|Y = y) will be so accurate as the condition ([3]) is satisfied. The 
question is: as we depart from normality, how justified are we to apply ([3])? In order to answer 
this question and to support the adoption of ([3]), we give the next result, which states that 
Bayes linear estimation is equivalent to mean and variance estimation employing assumption 

©• 

Theorem 1. Consider the vectors X andY as above. Under quadratic loss, fi x + A xy (Y — fi y ) 
is the Bayes linear estimator if and only if X — A xy Y -L2 Y. 
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The proof of this result is given in the appendix. Thus, if one is happy to accept the 
assumptions of Bayes linear optimality, she has to employ (J3|) . Next we give three illustrative 
examples that show assumption ([3]) may be approximately satisfied. 



Example A: checking postulate ([3]) for the multivariate Student t distribu- 
tion 

Let X 6 M. m and Y £ MP be random vectors with a joint Student t distribution with n 
degrees of freedom (Gupta and Nagar, 1999, §4.2). For example the marginal density of X is 
the Student t distribution X ~ T m {n, fi x , Cu) with density function 



ir-P/ 2 n n / 2 T{(n+p)/2} 
r(n/2)|Cii|V2 



{ n + (x - fi x yc^(x - fi x )y 



(n+p)/2 



for fj, x = E(X) and Var(X) = nC\\/{n — 2), where T(.) denotes the gamma function and 
denotes determinant. 
Write 



X 
Y 



/'y 



Cll C\2 
C\2 C22 



for some known parameters /j, x , /J>y, Cu, C12, and C22- The regression coefficient of X on Y 
is A xy = C^iCoo so that 



12<^ 2 2 



X — A X yY 



Y 



~ %n+p 



n. 



Cu — A xy C22A' xy 
C 22 



Now for any value y of Y, the conditional distribution of X — A xy Y given Y = y is 
X-A xy Y\Y = y ~T m {n + p,fi x - A xy fj, y , (C n - A X yC 2 2A' xy ) [l + n~ l (y - ^ 2/ )C 22 1 (y - fi y )'] } . 



Thus for any n > 0, E(X — A xy Y\Y = y) = E(X — ^^^y), while for the variance, for n > 2, 
it is Var(X - A xy Y\Y = y) w n(n - 2)" 1 (Cn - ^C 22 ^) = Var(X - A xy Y). For large 
postulate X — A^y^^ is thought to be satisfactory. 



Example B: checking postulate ([3]) for the inverted multivariate Student t 
distribution 

The inverted Student t distribution is discussed in Dickey (1967), in Gupta and Nagar (1999, 
§4.4) and it is generated from a multivariate normal and a Wishart distribution as follows. 
Suppose that X* ~ Ap(0, I p ) and S ~ W p {n+p— 1, I p ), for some n > 0, where W p (n-\-p— 1, I p ) 
denotes a Wishart distribution with n + p — 1 degrees of freedom and parameter matrix I p ; 
this distribution belongs to the orthogonally invariant and residual independent family of 
distributions, discussed in Khatrie et al. (1991) and Gupta and Nagar (1999, §9.5). For a 
vector fi and a covariance matrix C we define X = n l / 2 C 1 l 2 {Y, + X*{X*)'}- 1 / 2 X* where 
C 1 / 2 denotes the symmetric square root of C . Then the density of X is 



P{X) 



T{(n + p)/2} 



7 rP/ 2 r(n/2)|C| 1 /2 n (P+n-2)/2 



{n-{X-ii)'C-\X-ii)} 



n/2-1 



This density defines the inverted multivariate Student t distribution and the notation used is 
X~Xr p (n,M,C r ). 
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Following a similar thinking as in Example A we have that 



X - A xy Y ~ 2T m (n, fi x - A xy fi y , C u - A xy C 2 2A' xy ) 
and conditioning on Y = y (Gupta and Nagar, 1999, §4.4) we obtain 

X - A xy Y\Y = y ~ TT m {n,n x - A xy fj, y , (C n - A xy C 2 2A' xy )[l - n" 1 (y - \x y )'C 22 {y - Ms/)]}- 

So we conclude that for large n the mean and variance of X — A xy Y\Y = y and X — A xy Y 
are approximately the same and thus X — A xy Y± 2 Y. 

Example C: checking postulate ((3]) for the Wishart distribution 

Suppose that £ = (£ij)ij=i,2 follows a Wishart distribution £ ~ W2(n,S) with density 

p(£) = {2 ri r 2 (n/2)|5r/ 2 }^ 1 |S|("- 3 )/ 2 exp j-itr^E 

where exp(.) denotes exponent, tr(.) denotes the trace of a square matrix, S = (5V/)ij=i,2> 
n > are the degrees of freedom and ^(x) = ^T(x)T(x — 1/2) denotes the bivariate 
gamma function. Let X = £12 and Y = £22 and assume that we observe Y = y so that 
E(y) = nS"22 ~ y- From the expected values of the Wishart distribution (Gupta and Nagar, 
1999, §3.3.6), we can write 



" X ' 




5l2 




Y 




,n 


S22 



S11S22 + s\ 2 
2S12S22 



25125*22 
25*22 



which, with A xy = S12/S22, yields K(X — A xy Y) = and Vai(X — A xy Y) = n(SnS22 — 5 2 2 ). 

From Gupta and Nagar (1999, §3.3.4), the posterior distribution of X\Y = y is X\Y = 
V ~ AA{5 12 y/5 22 , (S n - Sj 2 /S 2 2)y} leading to E(X - A xy Y\Y = y) = = E(X - A xy Y) 
and Var(X - A xy Y\Y = y) = Var(X|Y = y) = (S u - Sf 2 /S 2 2)y = (SuS 2 2 - Sf 2 )y/S 2 2 = 
Yai(X — A xy Y). Thus we can establish that X — A xy Y ^Y . 



Examples A and B show that PSPP(l) modelling can be regraded as approximation to 
the true posterior mean and variance, corresponding to the full probability model assuming 
the distribution of these examples. 

Returning to Definition [H there are situations where the prior mean vectors and covariance 
matrices of X and Y are available, conditional on some other parameters, the typical example 
being when the moments of X and Y are given conditional on a covariance matrix V. Then, 
as V is usually unknown, the purpose of the study is to approximate the posterior mean 
vector and covariance matrix of X\Y = y as well as to approximate the posterior mean 
vector and covariance matrix of V. In such situations postulate ([3]) reads X — A xy Y± 2 Y\V 
and another postulate for V is necessary in order to approximate the moments of X\Y = y, 
unconditionally of V. Regression problems of this kind are met frequently in practice, as V 
can represent an observation variance or volatility, which estimation is beneficial to accounting 
for the uncertainty of predictions. We can then extend Definition [1] to accommodate for the 
estimation of V. 
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Definition 2. Let X , V and Y be any vectors of dimensions m, r and p respectively and 
assume that it exists the joint distribution of Z = [X' V' Y']' . Let A xy be the regression matrix 
of X on Y, given V and let B vy the regression matrix of V on Y. A second order partially 
specified prior posterior probability model for (X,V;Y) (notation: PSPP(2)), is defined such 
that: (a) X — A xy Y ~ 1_2Y\V an d V — B vy Y ^Y and (b) for any value y ofY, the mean vector 
and the covariance matrix of X\V,Y = y and V\Y = y are obtainable from the mean vector 
and the covariance matrices of X — A xy Y and V — B vy Y , respectively. 

An example of PSPP(2) model is the scaled observational precision model, which is ex- 
amined in detail in Sections [3] and [U Next we discuss the differences of PSPP(2) and Bayes 
linear estimation when V is a scalar variance. 

Goldstein (1979, 1983), Wilkinson and Goldstein (1996) and Wilkinson (1997) examine 
some variants of this problem by considering variance modifications of the basic linear Bayes 
rule, considered in Hartigan (1969) and in Goldstein (1976). Below we give a basic description 
of the proposed estimators and we indicate the similarities and the differences of the proposed 
PSPP models and of the Bayes linear estimators. Consider a simple regression problem 
formulated as Y\X, V ~ (X, V), X ~ {E(X), Var(X)}, where Y is a scalar response variable, 
X is a scalar regressor variable and E(X), Var(X) are the prior mean and variance of X. If 
V is known the posterior mean E(X|V, Y = y) can be approximated by the Bayes linear rule 



E( X)V + yVar(X ) 
V + Var(X) 



with related posterior expected risk 



Var(A)F 
Var(X) + V 



R(p) = \ Tr = Var(X)(l - A xy ), 



where A xy = Var(X) /{Var(X) + V} is the regression coefficient of X on Y, conditional on V. 
As it is well known R{[i) is the minimum posterior expected risk, over all linear estimators 
for E(X|y = y), and in this sense [i attains Bayes linear optimality. If one assumes that 
the distributions of Y\X, V and X are normal distributions, then fi gives the exact posterior 
mean E(X|V, Y = y) and R(^) gives the exact posterior variance Var(X|V, Y = y). However, 
in practice in many problems, V is not known, and ideally the modeller wishes to estimate 
V and provide an approximation to the mean and variance of X\Y = y, unconditionally of 
V. Suppose that in addition to the above modelling assumptions, in order to estimate V, a 
prior mean E(V) and prior variance Var(y) of V are specified, namely V ~ {(E(V), Var(V)}. 
Goldstein (1979, 1983) suggest to estimate V with the Bayes linear rule 

= E(y)Var(y*)+j/*Vax(VO 

Var(y*) + Var(F) ' 1 ' 

where y* is an observation from Y* , a statistic that is unbiased for V, and Var(y*) is specified 
a priori. Then the Bayes rule fj, is replaced by the rule //*, where V in \x is replaced by its 
estimate V* . One can see that the revised regression matrix A* y becomes 

Var(A) Var(X)Var(y*) +Var(X)Var(y) 



xy Var(X) + V* Var(X)Var(y*) + Var(X)Var(F) + E(V)Var(Y*) + y*Yar(V) 
and so the variance modified Bayes rule for E(X|F = y) is fi* = E(A) + A* y {y — E(X)}. 



From Theorem [H it is evident that the Bayes rule (j3|) is equivalent to X — A xy Y \L2Y\V '. 
The Bayes rule ([5]) corresponds to the postulate V — B vy Y -L2Y , although the latter does not 
establish the equivalence of the PSPP models and Bayes linear estimation methods, since it 
can be verified that fj,* and V* are not the same as in the PSPP modelling approach (see 
Section E]). In addition, the roles of Y* and y* are not fully understood; for example one 
question is how y and y* are related and how one can determine y* from y, especially when 
y is a vector of observations. The main problem experienced in the variance modified Bayes 
linear estimator /i* is that the related expected risk -R(/U*) can not easily be determined 
and the work in this direction (Goldstein, 1979, 1983) has led to either intuitive evaluation 
for i?(/x*) or it has led to imposing even more restrictions to the model in order to obtain 
an analytic formula for R(n*). Although, both of these approaches can work in regression 
problems, they are not appropriate for time series problems, where sequential updating is 
required and thus an accurate evaluation of that risk is necessary. On the other hand the 
PSPP approach combines the two postulates, X — A xy Y±2Y\V and V — B vy Y±2Y, using 
conditional expectations. It should be noted that the PSPP treatment is free of most of 
the assumptions made to the variance modified Bayes linear system so that approximate 
estimation of the posterior Var(X|Y) be given. The PSPP models are developed mainly 
for multivariate regression and time series problems and they are aimed to situations that 
either a fully Bayesian model is not available, or computationally intensive calculations, such 
as Monte Carlo methods, are undesirable, or a model can only be specified via means and 
variances. 



3 The scaled observational precision model 
3.1 Main theory 

The scaled observational precision (SOP) model is a conjugate regression model, which illus- 
trates the normal dynamic linear model with observational variances, see for example West 
and Harrison (1997, §4.5). This model is widely used in practice because it is capable to han- 
dle the practical problem of unknown observation variances. Here we construct a PSPP (2) 
model and we compare it with the usual conjugate SOP model. 
Let V be a scalar variance, X £ W 71 , Y G W with 



X 
Y 



V 



I'll 



V 



Ayx^X 



A X y^y 



for some known fi x , fj, y , T, x and T, y . 

Assuming X — A xy YJ-2Y\V , the partially specified posterior is 

X\V, Y = y~{fi x + A xy {y - fi y ), V(E X - A xy E y A' xy )}. 

Let T be a, generally non-linear, function of Y, often taken as 

T=(Y- f ,y)'^ 1 (Y- f ly). 

Define K to be a a times the variance of T\V, for some a > 0, and A VT to be the regression 
coefficient of V on T, conditional on K. We assume V — A VT TJ-2Y, K with forecast 



T\V,K ~ (V,K/a) and Cav(T,V\K) = Vai(V\K), 



8 



where V\K ~ (V, A/77), which is r\ja times as precise as the conditional distribution of T, 
for some known V, a, r/, with 

1 1 

1 (rj + a)/a 

Given the observation T = r, and using V — A VT T '-L2Y, K with A VT = a/(rj + a) we have 

r/V + ar 



' V ' 




A~ | 


V 


A 






T 




V 


77 



E(V\K,T = t) =E(V\K) + 



a 



[r-E(T\K)\ 



77 + a r] + a 

Var(V|A,T = r) = Var(V\T = r) - Cov(V, T|A){Var(T| A")} _1 Cov(T, V\K) 



so that 



A K 2 r/a 
77 T] 2 K(t] + a) 

V\K,T = t ~ 



A" 



1 



a 



77 + a 



A' 



77 + a 



7]V + ar AT 



77 + a 77 + a 
Hence using conditional expectations, it follows that 

i]V + ar 



X\Y = y ~ <fi x + A xy (y - /j, y ), 



7? + a 



(6) 



(7) 



where r = (y - H y )'?, y l {y - H y ). 

3.2 Comparison with the conjugate normal/gamma model 

Now consider the relationship of the above model with standard normal conjugate models. A 
typical normal conjugate model with unknown scalar variance V, postulates the distribution 
of Z given V as 



X 
Y 



V ~ Mr 



nip 



Ar 



V 



Ay x Yj x 



A-xy^y 



-'v 

with the distribution of V as an inverse gamma so that vsjV ~ xt- Here M mp {., .) denotes 
the mp-dimensional normal distribution and xt denotes the chi-squared distribution with v 
degrees of freedom. Writing T = (Y — fj iy Y'E~ 1 (Y — n y ), the conditional distribution of T 
given V can be easily derived from the distribution of TV" 1 !!/ which is TV" 1 !^ ~ Xp- Then 
the posterior distribution of V^ 1 given Y = y is 

v) ex H - ^ 



p(T\V)p(l/V) 



cx 



from which it is deduced that, given Y = y, (us + t)^" 1 ^ = y ~ xl+p- The posterior 
distribution of X\Y = y is a multivariate Student t distribution based upon v -\- p degrees of 
freedom with 



X\Y = y ~ T m , |i/ + p,fi x + A xy (y - fi y ), 



VS + T 

v + p 



Axy^yAyx / 



VS + T 
V 



y ' = 1/ ~ xl+p, r=(y- ix y )'?> y 1 (y - 



(8) 
(9) 
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Note that, if V = vsjiy + p — 3), rj = v + p — 3, and a = 1, then the posterior mean vector 
and covariance matrix of (|7|) and ([8]) are identical. However, this is not consistent with the 
conjugate model since from the prior assumption us/V ~ Xu & ^ s 

HV\s) = — ^V, (u>2), 

for any p > 1 . 

If we want to adopt the same prior for V = vs/{v — 2) in both the PSPP and the conjugate 
models, then the respective posterior means for V will differ, i.e. 

( D — 1 I US 

E(V\Y = y,PSPP model) - E(V\Y = y, conjugate model) - 



(y - 2){v + p - 2) : 



where we have used rj = u+p — 3 and a = 1 as before. Note that if Y is a scalar response, e.g. 
p = 1, then the two variance estimates are identical. So the respective posterior variances of 
equations (J7|) and (jHJ) will differ accordingly only when p > 1 . 
From the posterior distribution of 1/V we have that 

2(r + us) 2 

Var(V|Y = y, conjugate model) = - — ■ — T — ■ — (10) 

[y + p- 2) 2 (v + p — 4) 

while, from equation ([6]), the respective posterior variance for the PSPP model is 

Var(V\K, Y = y, PSPP model) = — , (11) 

u + p — 2 

where we have used a = 1 and rj = v+p—3. If we choose A' = 2{t+us) 2 / {{v+p— 2){v-\-p— 4)}, 
then the two variances will be the same. Note that, irrespectively of the choice of K (given 
that K is bounded), as the degrees of freedom v tend to infinity, the variances of both 
equations (QUI) and (fTTj) converge to zero and so as v — > oo, V concentrates about its mean 
asymptotically degenerating. 



3.3 Application to time series modelling I 

The above ideas can be applied to time series modelling when interest is placed on the estima- 
tion of the observation or measurement variance. Consider, for example, the p-dimensional 
time series vector Yj, which at a particular time t sets 

Y t = B t X t + e t , e t ~(0,VZ), X t = C t X t _ l +cu t , uj t ~{Q,VW), (12) 

where Bt is a known p x m design matrix, Ct is a known m x m transition matrix and the 
innovation error sequences {et} and {cot} ar e individually and mutually uncorrelated. The 
p x p and mx m covariance matrices Z and W are assumed known, while the scalar variance 
V is unknown. Initially we assume 

X \V ~ (m , VP ) and V ~ (%, —) , 

for some known tuq, Po, Vq, Kq and 770. It is also assumed that a priori, Xq is uncorrelated with 
{et} and {uJt}- Denote with y t the information set comprising the observations y±, y%, . . . , yt- 
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Then the PSPP model described above, applies at each time t with [i x = Ctirit-i, fi y = ft = 
BtCtmt-i, E x = R t = C t Pt-\C[ + W and E y = Q t = B t R t B' t + Z, where m t _i and P t -\ are 
calculated with the same way at time t — 1, starting with t = 1. Given the regression 

matrix of Xt on Yt is A xy = At = RtB'tQ^ 1 , which is independent of V . It follows that 
V\y* ~ (Vt, K t /i]t). With a = 1, it is K t = K t ~\ and ^ = r] t -i + 1 so that 

rjtVt = rit-iVt-i + e' t Qt l e t , 

where e' t Q^ 1 et = n and et = yt — ft is the 1-step forecast error vector. The above estimate 
Vt approximates the variance estimate of the conjugate dynamic linear model (West and 
Harrison, 1997, §4.5), which, assuming a prior rit-iVt-iV -1 ^ ~ XtL-ij arrives at the 

posterior {^iV-i + r^V^y* ~ X^ t _ 1+P so that E(F|y*) = mV / [m + P ~ 3) « 7 t . The 
variance of V\y l in the conjugate model is 

(r/t - 2y(r lt -4) 

whereas the respective variance in the PSPP model is Var(l/|y i ) = K/rft, with = i^o- 
Although these two variances differ considerably, in the sense that in the conjugate model the 
variance of V\y l is a function of the data y l and in the PSPP model the variance of V\y l is 
only a function of time t and on the prior Kq, it can be seen that as t — > oo, both variances 
converge to zero and so in both cases V\y l concentrates about its mean Vt asymptotically 
degenerating. 

In the PSPP model, the posterior mean vector and covariance matrix of Xt\y l are given 
by X t \y l ~ (m t , VtPt), where m t = Ctm t -i + A t e t and Pt = Rt~ A t QtA' t . These approximate 
the respective mean vector and covariance matrix produced by the conjugate model, which, 
under the inverted gamma prior, results to the posterior Student t distribution: Xt\y l ~ 
%n{r]t,m t ,VtPt). 



4 The generalized observational precision model 



4.1 Main theory 

The generalization of the SOP model of Section [3] when V is a p x p variance-covariance 
matrix is not available and only special forms of conjugate SOP models are known (West 
and Harrison, 1997, Chapter 16). The problem is that since the dimensions of X and Y 
are different, it is not possible to scale the covariance matrix of X\V by V, because X has 
dimension m and V is a p x p matrix. This problem is discussed in detail in Barbosa and 
Harrison (1992) and Triantafyllopoulos (2007). Next we propose a generalization of the SOP 
model, in which, given V, we avoid to scale the covariance matrices of X and Y by V. 
This setting is more natural than the setting of the SOP, which considers the somewhat 
mathematically convenient variance scaling. 

Let V be a p x p covariance matrix, X E M m , Y £ R p with 



Z 



X 
Y 



V 



■'■y 



^y + V 



for some known fi x , fi y , T, x and T, y , not depending on V. Note that now we cannot gain a 
scaled precision model. Even if we assume prior distributions for Z\V and V, we can not 



11 



obtain the marginal distributions X\Y = y and V\Y = y in closed form, since the covariance 
matrices of X and Y are not scaled by V . 

Assuming X — A xy YJ-2Y\V, conditional on V, the partially specified posterior is 

X\V, Y = y~{ii, x + A xy (y - % ), Y> x - A xy (Z y + V)~ l A' xy }. (13) 

Define T = (Y — fJ, y )(Y — fi y )' — T, y and denote with vech(V) the column stacking operator 
of a lower portion of the symmetric positive definite matrix V. Given V, the forecast of T is 



vech(T)\V,K ~ \vech(V) 



K 



K 



a 



and 



Cov{vech(V),vech(T)} = — = Var{vech(V|iO}, 

V 

where a, rj are known positive scalars and K is a known {p(p + l)/2} x {p(p+l)/2} covariance 
matrix. With V the prior estimate of V and / p ( p +i)/2 the {p(p + l)/2} x {p(p+ l)/2} identity 
matrix, we have 



vech("^) 
vech(T) 



K 



vech(y) 
vech(i>) 



K 
V 



p(p+l)/2 



/ p(p+l)/2 



- / p(p+i)/2 (v + ot)a 1 I p ( p+ i)/2 



The regression matrix of vech(V) on vech(T) is A VT = a(r] + a) 1 ^ p ( p +i)/2- Assuming now 
that vech(y) — A VT vech(T)J-2T\K we obtain the posterior mean and covariance of V as 



E{vech(V)\K,T = r} = vech(y) + — — Ivech(r) - vech(y)) 

T] + a I 



and 



Var{vech(T/)|ET,T = r} = Var{vech(y)| J ft:} + A VT V&v{vech(T)\K}A' v , 



K 



rj + a 



so that 



vech(V)\K,T = t 



vech.(r]V + err) 



7] + a 7] + a I 
from which we see that the posterior mean of V can be written as 

r/V + ar 



(14) 



E(V\K,T = t) = V + 



a 



rj + a 



V 



r] + a 



We note that in general the regression matrix A xy in (|13p will be a function of V~ l and this 
adds more complications to the calculation of the mean and covariance matrix of X\Y = y. 
However, if we impose the assumption that Cov(X,Y\V) = ^4Var(y), where A is a known 
m x p matrix not depending on V, then A xy = A is independent of V and so we get 



X\Y 



1 



l^x ~\~ A xy (jJ fJ"u)j 

i] + a 



A xy [E y + r]V + ar) A' a 



(15) 



y . Given that K is bounded, as r/ — > oo, the covariance matrix 



where r = {y — ix y ) (y — jjL y )' — X 
of vech(V)\K, T = r converges to the zero matrix and so V\K,T = r concentrates about its 
mean E(V|i^, T = r) asymptotically degenerating. This can be a theoretical validation of the 
proposed procedure for the accuracy of the estimator of V, K(V\K, T = r) = (rjV + ar) /(r] + 
a). 



12 



4.2 Application to linear regression modelling 

A typical linear regression model sets 

Y = BX + e, e~(0,Y), X ~ (fi x ,Z x ), (16) 

where Y is a p-dimensional vector of response variables, B is a known pxm design matrix and 
e is a p-dimensional error vector, which is uncorrelated with the random m-dimensional vector 
X. The mean vector \x x and the covariance matrix T, x are assumed known and T, y = BT, X B' 
so that Var(Y) = BT, X B' + V. The covariance matrix of X and Y is Cov(X, Y) = E X B' and 
so the assumption Cov(X, Y) = J 4{Var(Y)}~ 1 , does not hold, since Var(Y) is a function of V. 
Thus the posterior mean vector and covariance matrix of equation fll5|) do not apply, since now 
Aj^ is stochastic in V. In order to resolve this difficulty next we propose an approximation 
that will allow computation of equation (|13j) . 

In order to proceed, we will need to evaluate + Y) _1 |Y = y} and VarjvechKS^ + 

Y) _1 }|Y = y}. Since we only have equation Q14|) and we have no information on the distribu- 
tion of Y, we can not obtain the above mean vector and covariance matrix. Here we choose 
to adopt an intuitive approach suggesting that 

V = E{(E v + V)- 1 \K,T = t} ^{Z y + E(V\K,T = t)}- 1 
= (rj + a) |(t7 + a)T, y + r)V + arj , 

V = Var[vech{(S y + V)~ l }\K,T = r] w Var{vech(£ y + Y)|K,T = r} = — ^— . 

The reasoning of this is as follows. Since lim^oo Var{vech(Y)|K, T = r} = 0, V concentrates 
about its mean and so we can write V ~ K(V\K,T = r), for sufficiently large rj. Then 
(E y + Y)" 1 w {T, y + E(V\K,T = t)}^ 1 . The covariance matrix of vech{(S y + Y) -1 } has 
been set approximately the same with the covariance matrix of vech(S y + Y) ensuring that 
for large rj, both covariance matrices converge to zero. 

The above problem of the specification of Y and Y can be generally presented as follows. 
Suppose that M is a bounded covariance matrix and assume that E(M) and Var{vech(M)} 
are finite and known. The question is, given only this information, can one obtain E(M _1 ) 
and Var{vech(Af For example one can notice that if M follows a Wishart or inverted 
Wishart distributions, then Y is approximately true. Formally, if M ~ W p (n,S) (M follows 
the Wishart distribution with n degrees of freedom and parameter matrix S, see e.g. Gupta 
and Nagar, 1999, Chapter 3), we have E(M) = nS and E(M" 1 ) = S~ l /{n - p - 1) = 
n{M(M)}~ 1 / (n — p—1), which implies E(M~ 1 ) Rj {E(M)}" 1 , for large n. If M ~ lW p (n, S) 
(M follows the inverted Wishart distribution with n degrees of freedom and parameter matrix 
S, see e.g. Gupta and Nagar, 1999, Chapter 3), we have E(M) = S/(n — 2p — 2) and 
so E(M _1 ) = (n - p - ^S^ 1 = (n - p - l){E(M)}~V( n - 2p - 2), which again implies 
E(M -1 ) Rj {E(il/)} _1 , for large n. Of course M might not follow Wishart of inverted Wishart 
distributions and in many practical situations we will not have access to the distribution of 
M. For general application we can verify that E(Af~ 1 ) ~ {E(M)} _1 , if and only if M and 
M" 1 are uncorrelated. The accuracy of the choice of Y is reflected on the accuracy of the 
one-step predictions, which is illustrated in Section [5.11 

We can now apply conditional expectations to obtain the mean vector and the covariance 
matrix of X\Y = y. Indeed from the above and equation (|13p we have 

E(X|Y = y) = fM x + E(A xy \Y = y)(y- fM y ) = fi x + E x B'V(y - % ). 
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For the covariance matrix Var(X|Y = y) we have 



E{V a v(X\V,Y = y)\Y = y} 



= X X -X X B'E{(E + V)- 1 \Y = y}BZ 



•x 



and 



Var{E(X|F,Y = y)\Y = y} 



Vai[vec{E x fl'(E„ + V)~\y - » y )}\Y = y] 
{(y - iiy)' ® X x B'}G p VG' p {(y - ^) ® 



where ® denotes Kronecker product, vec(-) denotes the column stacking operator of a lower 
portion of a matrix and G p is the duplication matrix, namely vecKS^+y)" 1 } = G p vech{(Sj,+ 



We note that the mean vector and covariance matrix of X\Y = y depend on the estimates 

V and V. A simple intuitive approach was employed in this section and next we give an 

assessment of this approach by simulation. In general, equation (|17p holds where V and V 
are any estimates of the mean vector and covariance matrix of (E„ + V) -1 |Y = y. 

4.3 Application to time series modelling II 

In this section we consider the state space model f|12|) . but the covariance matrices of the error 
drifts €t and cot are Var(e() = V and Var(cJt) = W. Here V is an unknown p x p covariance 
matrix and W is a known m x m covariance matrix. The priors are partially specified by 



for some known mo, Po, Vq, Kq and r/o- It is also assumed that a priori, Xq is uncorrelated 
with {et} and {oot\- Note that in contrast with model (|12h . the above model is not scaled 
by V and in fact any factorization of the covariance matrices by V would lead to restrictive 
forms of the model; for a discussion of this topic see Harvey (1989), Barbosa and Harrison 
(1992), West and Harrison, (1997, §16.4), and Triantafyllopoulos (2006a, 2007). Before we 
give the proposed estimation algorithm, we give a brief description of the related matrix- 
variate dynamic models (MV-DLMs) and the restrictions imposed in these models. 

Suppose {Yt} is a p-dimensional vector of observations, which are observed in roughly 
equal intervals of time t = 1,2,3,.... Write Y t = \Y\ t Y^t ••■ Y pt ]' , where each of Ya is 
modelled as a univariate dynamic linear model (DLM): 

Y it = B' t X it + e»t, X it = C t Xi^-\ + Lo it , e it ~ AA(0, an), u it ~ A/" m (0, auWi), 

where Bt is an ?n-dimensional design vector, Xa is an ?n-dimensional state vector, Ct is 
an m x m transition matrix and the error drifts and Un are individually and mutually 



v)- 1 }. 



Thus the mean vector and the covariance matrix of X\Y = y are 



X\Y = y 




(17) 
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uncorrelated and also they are uncorrelated with the state prior Xio, which is assumed to 
follow the normal distribution Xi ; o ~ A/" m (m, i o, Pi,o), for some known m^o and P^o- The mxm 
covariance matrix Wj is assumed known and the variances an, a 22 , ■ ■ ■ , &p P form the diagonal 
elements of the covariance matrix X] — (^j)i,j=i,2,...,p; which is assumed unknown and it is 
subject to Bayesian estimation under the inverted Wishart prior £ ~ TW p (no + 2p, n^So), 
for some known no and So- The model can be written in compact form as 

Yl = B' t X t + e' u X t = C t X t -i + UH, e t ~ A/p(0, £), vec(a; t ) ~ X mp (0, E <g> W), (18) 

where B[ = [B' lt B' 2t ■ ■ ■ B' pt ], X t = [X lt X 2t ■ ■ ■ X pt ], C t = diag(C u , C 2t , C pt ), vec(V ) ~ 
A/" mp {vec(m ), S <g> P }, for m = [mi,o ra 2 , ••• m P;0 ] and P = diag(Pi i0 , P2,o, ■ ■ ■ , -Pp.o)- 
Model (|18|) is termed as matrix-variate dynamic linear model (MV-DLM) and it is studied 
in Quintana and West (1987, 1988), Smith (1992), West and Harrison (1997, Chapter 16) 
Triantafyllopoulos and Pikoulas (2002), Salvador et al. (2003, 2004), Salvador and Gargallo 
(2004), and Triantafyllopoulos (2006a, 2006b); Harvey (1986, 1989) develop a similar model 
where £ is estimated by a quasi likelihood estimation procedure. The disadvantage of model 
(TT8|) is that Yu, Y 2t , ■ ■ ■ , Y pt are restricted to follow similar patterns since the model compo- 
nents Bt and Ct are common for all i = 1, 2, . . . ,p. One can notice that the only difference 
between Yu and Yj t (i 7^ j), is due to the error drifts en, u>u and ejt, ujjt- Thus, for example, 
model (fl~8j) is not appropriate to model Y t = [Yu Y^t]', where Y\ t is a trend time series and Y 2t 
is a seasonal time series. It follows that when there are structural changes between Yu and 
Yjt, the MV-DLM might be thought of as restrictive and inappropriate model and its use is 
not recommended. When p is large one can hardly justify the "similarity" of Yu, Y 2 t, ■ ■ ■ , Y p t. 
We believe that in practice the popularity of the MV-DLM is driven from its mathematical 
properties (fully Bayesian conjugate estimation procedures for sequential forecasting and fil- 
tering/smoothing), rather than from a data driven analysis. Although we accept that in some 
cases the MV-DLM can be a useful model, we would submit that in many time series problems 
this model is unjustifiable and the above discussion expresses our reluctance in suggesting the 
MV-DLM for general use for multivariate time series problems. 

Returning now to the PSPP dynamic model, denote with y 1 the information set comprising 
data y\,y 2 , ■ ■ ■ ,yt- If at time t — 1 the posteriors are partially specified by Xt-i\y l ~ l ~ 
(mt-i,Pt-\) and vech(F)|y* _1 ~ {vech(Vt-i), Tj^Kt-i}, for some known m t -i, Pt-i, Vt-i, 
Kt-i and rft-i, then by direct application of the theory of Section [J] we have for time t: 
u x = C t m t -i, E x = R t = CtPt~iC' t + W , u y = ft = B t C t m t -i, ^ y = B t R t B' t and A xy = 
At = RtB' t (BtRtB' t + V)~ l . The 1-step ahead forecast covariance matrix is Qt = Vax(Yt\y t ) = 
B t RtB' t + Vt-i and so we have Y t \y l ~ l ~ (ft, Qt)- Given Y t = yt, the error vector is et = yt. — ft 
and so the posterior mean of V|y* is 

rjtVt = rjt-iVt-i + e t e' t - B t R t B' t , 

where we have used a = 1. Thus it is 

vech(y)|y* ~ jvech(T4),— 



Vt 

where rjt = i]t-i + 1 and Kt = ^Q-i- It follows that Kf = K$ and therefore as t — > 00, V\y l 
concentrates about Vt asymptotically degenerating. By observing that B t RtB' t = Q t — Vt-\ 
and writing the updating of Vt recurrently, we get 

vt=vt- 1+ e -^^=v 0+ j: e -^i. 

1 = 1 
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By forming now the standardized 1-step ahead forecast errors e* = Q t ej, where Q t 
denotes the symmetric square root of , one can obtain a measure of goodness of fit, 
since e\ ~ (0,I P ). This can easily be implemented, by checking whether the mean of 
e\{e\)' ', e^e^y, . . . , e| (e*)' is close to I p or equivalently by checking that, for e% = [e* t e\ t ■ ■ ■ e* t ]', 
the mean of each (e*^) 2 , (e* 2 ) 2 ; • • • > (e*J 2 is close to 1 and e* t is uncorrelated with e| t , for all 
t and i / j. 

Applying the procedure adopted in linear regression, we have that the posterior mean 
vector and covariance matrix are given by X^y 1 ~ (m t ,Pt), with 

m t = C t mt-i + RtB' t V t e t 

and _ 

Pt = R t - R t B' t V t B t R t + (e' t ® R t B' t )G p V t G' p (e t ® 

where 

T4 = (BtRtBi + Vt)- 1 and K t = — . 

From 77^ = r]t-\ + 1 it follows that as limt_ i . tX3 r/t = oo it is lim^oo Vt = and so for large 
t the posterior covariance matrix P± can be approximated by Pt ~ Rt — RtB'tVtBtRt- This 
can motivate computational savings, since there is no need to perform calculations involving 
Kronecker products. 



5 Numerical illustrations 

In this section we give two numerical examples of the state space model considered in Section 



5.1 A simulation study 

We simulate 1000 bivariate time series under 3 state space models and we compare the 
performance of the proposed model of Section [4.31 (referred here as DLM1), of the MV-DLM 
discussed in !4.3l (referred here as DLM2) and of the general multivariate dynamic linear model 
(referred here as DLM3). Let Yj = \Y\t l^t]' be a bivariate time series. In the first state space 
model we simulate 1000 bivariate time series from the model 



Yt 



1 
1 



X t + e t , X t 



1 
1 



Xt^+ut, e t ~N 2 (0,V), cj t ~AA 2 (0,/ 2 ), (19) 



where Xt is a bivariate state vector and the remaining components are as in Section 
Initially we assume that Xq ~ A/^O, I2) and the covariance matrix V is 



=1,2 



1 2 

2 5 



which means that the variables Y\t and Yit are highly correlated. The generated time series 
{Yt} comprise two local level components, namely {Yit} and {Ybt}- We note that DLM3 is 
the correct model, since it is used to generate the 1000 time series. 
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Table 1: Performance of the PSPP dynamic model (DLM1), MV-DLM (DLM2) and the 
general bivariate dynamic model (DLM3) over 1000 simulated time series of two local level 
components (LL), one local level and one linear trend component (LT) and one local level and 
one seasonal component (LS). Shown are the average (over all 1000 simulated series) values 
of the mean square standard error (MSSE), of the mean square error (MSE), of the mean 
absolute error (MAE) and of the mean error (ME). 



type 


model 


MSSE 




MSE 




MAE 




ME 








yu 




yu 


y2t 


yu 


y2t 


yu 


2/2f 


LL 


DLM1 


0.905 


1.045 


2.536 


7.975 


1.521 


2.249 


-0.049 


-0.022 




DLM2 


1.009 


1.075 


2.556 


8.635 


1.259 


2.348 


0.012 


-0.004 




DLM3 


0.998 


1.022 


2.342 


7.894 


1.208 


2.238 


0.013 


0.008 


LT 


DLM1 


0.913 


1.057 


3.407 


13.017 


1.399 


2.784 


-0.157 


-0.276 




DLM2 


1.113 


1.075 


3.835 


16.105 


1.552 


3.170 


-0.003 


-0.106 




DLM3 


0.996 


0.993 


2.569 


11.221 


1.274 


2.614 


-0.093 


-0.320 


LS 


DLM1 


1.054 


0.953 


2.373 


7.897 


1.228 


2.235 


0.015 


0.119 




DLM2 


1.186 


2.829 


2.450 


200.963 


1.259 


10.755 


-0.006 


0.057 




DLM3 


0.982 


0.994 


2.361 


7.856 


1.224 


2.218 


0.017 


0.112 



In the second state space model we simulate 1000 time series from the model 



1 
1 



X t + e t , X t 



1 1 
1 



-Xt-i+Wt, e t ~N 2 {0,V), tot ~ AA 2 (0,/ 2 ), 



and the remaining components are as in (|19p . The generated time series from this model are 
time series comprising {Yu} as a local level component and {Y 2 t} as a linear trend component. 
Finally, in the third state space model, we simulate 1000 time series from the model 



Y 



1 
10 



X t + e t , X t 



1 

cos(7r/6) sin(7r/6) 
— sin(7r/6) cos(7r/6) 



(20) 



where et ~ A/^O, V), u>t ~ A/^O,^) and here X t is a trivariate state vector with initial 
distribution Xq ~ A/3 (0,i3) and the remaining components of the model are as in (|19p . The 
generated time series from this model are bivariate time series comprising {Yu} as a local 
level component and {Y 2 t} as a seasonal component with period 7r/3. Such seasonal time 
series appear frequently (Ameen and Harrison, 1984; Godolphin, 2001; Harvey, 2004). 

Tables Q] and [2] show the results. In Table [1] the three state space models (DLM1, DLM2 
and DLM3) are compared via the mean of squared standard 1-step forecast errors (MSSE), 
the mean square 1-step forecast error (MSE), the mean absolute 1-step forecast error (MAE) 
and the mean 1-step forecast error (ME). For a discussion of these measures of goodness of 
fit, known also as measures of forecast accuracy, the reader is referred to general time series 
textbooks, see e.g. Reinsel (1997) and Durbin and Koopman (2001). In a Bayesian flavour, 
goodness of fit may be measured via comparisons with MCMC methods (which provide the 
correct posterior destinies) or via Bayes monitoring systems, such as those using Bayes factors; 
see West and Harrison (1997). 
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Table 2: Performance of estimators of the covariance matrix V = (Vij)ij=i, 2 , produced by 
the PSPP dynamic model (DLM1) and the MV-DLM (DLM2). Shown are the average (over 
all 1000 simulated series; see Table [TJ values of each estimator for times t = 100, t = 200 and 
t = 500. 



type V = (Vij)ij = i >2 


DLM1 DLM2 
t = 100 


DLM1 DLM2 
t = 200 


DLM1 DLM2 
t = 500 


LL Vn = 1 

V 12 = 2 

^22 =5 


1.347 0.961 
2.352 1.047 
5.846 3.407 


1.072 0.954 
1.792 0.914 
4.332 2.874 


0.988 0.974 
2.087 1.113 
5.215 3.290 


LT V n = 1 
V 12 = 2 
V 22 = 5 


2.087 0.475 
3.169 0.463 
6.200 2.509 


1.599 0.647 
2.375 0.721 
4.627 2.718 


1.210 0.678 
2.217 0.802 
5.043 2.851 


LS V u = 1 

V 12 = 2 
V 22 = 5 


0.627 0.729 
1.497 0.887 
4.084 3.548 


0.782 0.851 
1.674 0.901 
4.104 11.439 


0.960 0.955 
1.872 0.907 
4.626 76.609 



Section 14.31 details how the MSSE has been calculated. Out of the three models we know 
that DLM3 is the correct model, since it is used to generate the time series data. For the 
local level components (LL), both DLM1 and DLM2 put good performances with the DLM2 
having the edge and being closer to the performance of the DLM3. This is expected, since as 
we noted in Section 14.31 when both time series components Y\t and Y 2 t are similar the MV- 
DLM (DLM2) has good performance. However, in the LT and LS time series components, 
where the two series Y\% and Y 2 t in each case, are not similar, we expect that the DLM2 will 
not perform very well. This is indeed confirmed by our simulations, for which Table [T] clearly 
shows that the performance of DLM1 is better than that of the DLM2. For example, for the 
LS component, the MSSE of the DLM1 is [1.054 0.953]', which is close to [1 1]', while the 
respective MSSE of the DLM2 is [1.186 2.829]'. 

Table [2] looks at the accuracy of the estimation of the covariance matrix V, for each model. 
For the LL components Vn = 1 is estimated better from DLM2, although for t = 500 the 
estimate from DLM1 is slightly better. For V± 2 = 2 and V 22 = 5, DLM2 produces poor results 
as compared to the DLM1. For example, even for t = 500 the estimate of V22 = 5 of the 
DLM2 is only 3.290, while the estimate of the DLM1 is 5.215. This phenomenon appears to 
be magnified when looking at the LT and LS components, where for example even at t = 500 
for the LT the estimate of V\ 2 = 2 and for the LS the estimate of V 22 = 5 are 0.802 and 
76.609, while the respective estimates from the DLM1 are 2.217 and 4.626. The conclusion 
is that the DLM1 produces a consistent estimation behaviour over a wide range of bivariate 
time series, while the DLM2 (matrix-variate DLM) produces acceptable performance when 
the component time series are all similar. 

It should be stated here that, the matrix-variate state space models of Harvey (1986) 
produce a similar performance with the DLM2; Harvey (1989) calls the above matrix-variate 
models as 'seemingly unrelated time series models' to indicate the similarity of the component 
time series. The models of Triantafyllopoulos and Pikoulas (2002) and Triantafyllopoulos 
(2006a, 2006b) and of many other authors (see the citations in Harvey, 1989; West and 
Harrison, 1997; Durbin and Koopman, 2001) can only accommodate for regression type state 
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Real data vs 1 -step forecasts 




i i i i i 

1950 1955 1960 1965 1970 

year 



Figure 1: US Investment and Change in Inventory time series yt = [yit V2t\' with its 1-step 
forecast mean f t = [f\ t f2t]'- The top solid line shows yu and the bottom solid line shows 
y2t', the top dashed line shows fu and the bottom dashed line shows fit- 

space models and for local level models. More general structures, such that of model (|20[) 
can only be dealt with via simulation-based methods, such as Monte Carlo simulation. For 
high-dimensional dynamical systems and in particular for observation covariance estimation, 
the proposal of PSPP state space model of Section 14.31 offers a fast and reliable approximate 
estimation procedure, which can be applied for a wide range of time series. 

5.2 The US investment and business inventory data 

We consider US investment and change in business inventory data, which are deseasonalised 
and they are measured quarterly into a bivariate time series (variable yu- US investment 
data and variable y^t- US change in inventory data) over the period 1947-1971. The data are 
fully described and tabulated in Liitkepohl (1993) and Reinsel (1997, Appendix A). The data 
are plotted in Figure [1] with their forecasts, which are generated by fitting the linear trend 
PSPP state space model 
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Y, 



1 
1 



X t + e t , X t 



1 1 
1 



X t _i+wt, e t ~(0,V), wt~(0,W t ), (21) 



where here we have not specified the distributions of and as normal and we have replaced 
the time- invariant W of Section [4.31 with a time-dependent Wt- Model (|2ip is a PSPP linear 
trend state space model, for which we choose the priors mo = [80.622 4.047]' (mean of 
[Yu Y 2t ]' for t = 1941 - 1956, indicated in Figured] by the vertical line), P = 1000/ 2 (weakly 
informative prior covariance matrix or low precision Pq 1 ps 0) and 



Vn 



66.403 
22.239 



22.239 
46.547 



which is taken as the sample covariance matrix of Y\t and Y 2 t, for the time period 1941-1955. 
The covariance matrix Wt measures the durability and the stability of the change or evolution 
of the states X t . Here we specify Wt with 2 discount factors, 8\ and 82, as follows. With G 
as the evolution matrix of Xt and A the discount matrix 



G 



1 1 
1 



A 



81 
8 2 



W t = A-^GPt-iG'A- 1 / 2 - GPt-iG', 



we have 



where Rt in the recursions of Section l4~3l is replaced by Rt = GPt-\G' + Wt- Although this 
discounting specification is not advocated by West and Harrison (1997, §6.4), it has been 
successfully used (McKenzie, 1974, 1976; Abraham and Ledolter, 1983, Chapter 7; Ameen 
and Harrison, 1985; Goodwin, 1997). 

The values of 8\ and #2 are chosen by experimentation. The above model gave the best 
result with a combination of discount factors 8\ = 0.2 and 82 = 0.4. The performance 
measures were MSSE = [1.001 1.101]', MSE = [111.165 66.941]', MAE = [6.718 6.855]' and 
ME = [0.076 1.725]'. Other combinations of 8± and 82 yield less accurate results, with the 
usual effect that one of the two series y±t and y2t is accurately predicted, but the other one 
series is badly predicted. This problem certainly arises when 8± = 82, which clearly indicates 
the need of multiple discounting. Also, Figure [2] plots the observation variance, covariance 
and correlation estimates in the time period 1956-1970. From this plot we observe that the 
variability of the change in inventory time series component %)2t is much larger than that of 
yit- The estimate of the observation correlation indicates the high cross-correlation between 
the two series. 



6 Discussion 



This paper develops a method for approximating the first two moments of the posterior 
distribution in Bayesian inference. This work is particularly appealing in regression and time 
series problems when the response and parameter distributions are only partially specified 
by means and variances. Our partially specified prior posterior (PSPP) models offer an 
approximation to prior /posterior updating, which is appropriate for sequential application, 
such as in time series analysis. The similarities and differences with Bayes linear methods 
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Posterior variance 



Posterior correlation 




1960 1965 1970 1960 1965 1970 

year year 

Figure 2: Posterior estimates of the observation covariance matrix V = (14j')j,.j'=i,2 and esti- 
mates of the correlation p = V12/V ViiV^- In the left panel graph, shown are: estimate of 
the variance Vu (solid line), estimate of the variance V12 (dashed line), and estimate of the 
variance V22 (dotted line). In the right panel graph, the solid line shows the estimate of p. 

are indicated and, although the authors do believe that Bayes linear methods offer a great 
statistical tool, it is pointed out that in some problems, considered in this paper and in 
particular for time series data, the PSPP modelling approach can offer advantages as opposed 
to Bayes linear methods. 

PSPP models are developed having in mind Bayesian inference for multivariate state space 
models when the observation covariance matrix is unknown and it is subject to estimation. 
This paper outlines the deficiency of the existing methods to tackle this problem and it is 
shown empirically that, for a class of important time series data, including local level, linear 
trend and seasonal components, PSPP generates much more accurate and reliable posterior 
estimators, which are remarkably fast and applicable to a wide range of time series data. US 
investment and change in inventory data are used to illustrate the capabilities of the PSPP 
state space models. 

Given the similarities of the PSPP with Bayes linear methods, it is believed that the 
applicability of the PSPP approach goes beyond the examples considered in this paper. For 
example one area that is only slightly touched, is inference for data following non-normal 
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distributions, other than the multivariate t, the inverted multivariate t, and the Wishart 
distributions. In this sense a more detailed comparison of PSPP with Bayes linear methods 
and in particular with Bayes linear kinematics (Goldstein and Shaw, 2004), should shed more 
light on the performance of PSPP. It is our purpose to consider such comparisons in a future 
paper. 
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Appendix 

Proof of Theorem^ (=►) By hypothesis E(X|Y) = fJ, x + A xy (Y - n y ) => E(X - A xy Y\Y) = 
fi x — A xy fiy = constant. Furthermore Var(X|Y) = E{(X — fj, x — A xy (Y — [i y ))(X — fi x — A xy (Y — 
Hy))'\Y} = E X - A xy T, y A' xy = constant Var(V - A xy Y\Y) = Var(V|Y) = constant. It 
follows that X - A xy Y± 2 Y. 

(<=) The assumption X — A xy Y J-2Y implies that E(X — AjyYlY) = [i constant =>■ 
E(V|Y) = A xy Y + /i, which is a linear function of Y . Given that E(X|Y) minimizes the 
quadratic prior expected risk and \x x + A xy (Y — fi y ) minimizes this risk among all linear 
estimators, it follows that E(X|Y) = fJ. x + A xy (Y — fj, y ). □ 
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